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Abstract —We study the performance of discrete-time con¬ 
sensus protocols in the presence of additive noise. When the 
consensus dynamic corresponds to a reversible Markov chain, we 
give an exact expression for a weighted version of steady-state 
disagreement in terms of the stationary distribution and hitting 
times in an underlying graph. We then show how this result 
can be used to characterize the noise robustness of a class of 
protocols for formation control in terms of the Kemeny constant 
of an underlying graph. 

I. Introduction 

The design of policies for control and signal processing 
in networks of agents (such as UAVs, vehicles, or sensors) 
has attracted considerable attention over the past decades. It 
is commonly believed that such policies benefit from being 
distributed, for example by relying only on local, nearest- 
neighbor interactions in a network of nodes. Understanding 
how simple, distributed updates can accomplish global objec¬ 
tives and giving quantifiable bounds on their performance has 
correspondingly been an active area of research recently. 

An emerging understanding is that a key tool for such 
systems is the so-called consensus iteration, namely the update 

x (t + 1) = Px(t), 

where P is a stochastic matrix. This update has the property 
that, subject to some technical assumptions on the matrix P, 
the vector x{t) converges to spanjl}, the subspace spanned 
by the all-ones vector. One usually thinks of each component 
Xi(t) as being controlled by a different “agent,” with the agents 
asymptotically ’’coming to consensus” as all the components 
of x(t) approach the same value. 

It turns out that many sophisticated network coordination 
tasks can be either entirely reduced to consensus or have 
decentralized solutions where the consensus iteration plays a 
key role; we mention formation control lETTI . l30l . l29l . l23l . 
distributed optimization 03, DU, coverage control 0 , m, 
distributed task assignment sa, am networked Kalman filter¬ 
ing 0, 03, CD, ED- cooperative flocking/leader-following 
HO), HO), among many others. 

For example, it is wel-known (and we spell out later in this 
paper) that the problem of maintaining a formation when every 
agent can measure the relative positions of its neighbors can 
be solved with a distributed update rule which turns out to be 
the consensus update after a change of variable. 
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As a consequence of the many applications of consensus, a 
large literature has recently built up around it. In this paper, we 
contribute to a strand of this literature which studies the effect 
of noise; specifically, we study the noisy consensus iteration 

x(t + 1 ) = Px(t) + w(t), ( 1 ) 

where the matrix P is stochastic as before and the vector w(t) 
represents zero-mean i.i.d. noise. Our goal in the present paper 
is to contribute to an understanding of how much the “coming 
to consensus” property deteriorates due to the addition of the 
noise term w(t ) in Eq. ([T]). The main concern of this paper is 
the notion of expected disagreement; namely we will consider 
the average expected square deviation of the values Xi(t) from 
their (weighted) average as t—¥ +oo. 

Intuitively, the action of multiplying a vector x(t) by a 
stochastic matrix P has the effect of bringing the components 
Xi{t) “closer together,” while the addition of the noise w{t) 
counteracts that; the two processes might be expected to 
balance in some equilibrium level of expected disagreement 
as t —> oo. We are motivated by the observation (made in the 
previous literature on the subject and discussed extensively 
later) that the equilibrium level of disagreement often grows 
with the dimension of the matrix P. 

Thus even though Eq. 0 can be shown to be stable 
(under some mild technical assumptions) in the sense that 
expected disagreement between any pair of nodes is bounded 
as t —> oo, this stability can be almost meaningless for 
many classes of large systems in which expected disagreement 
scales with dimension. This has implications for all distributed 
protocols which rely on consensus, as it implies that in some 
cases they may lack robustness under the addition of noise. 
Building on previous work and contributing to the study of 
this phenomenon is the goal of the present paper. 

A. Literature review 

The problem of analyzing the steady-state level of disagree¬ 
ment in consensus with noise was initiated in ll38l where, 
for a symmetric matrix P and under the assumption that the 
components of w(t ) are uncorrelated, an explicit expression in 
terms of the eigenvalues of P was given. Recently El gave 
an alternative expression in terms of the average resistance 
associated with a graph based on the symmetric matrix P, 
and further showed this expression can be used to bound the 
steady-state disagreement from above and below in the more 
general case when the stochastic matrix P is not necessarily 
symmetric but rather corresponds to a reversible Markov chain. 

Continuous analogues of Eq. 0 have also been studied. 
When the underlying dynamics comes from a symmetric graph 
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Laplacian, expressions for equilibrium disagreement in terms 
of eigenvalues, resistances, and hitting times were presented 
in EH. When the underlying dynamics is not necessarily 
symmetric but satisfies a normality property, expressions for 
disagreement in terms of eigenvalues were given in (39]. 

The observation that Eq. 0 can have asymptotic disagree¬ 
ment which grows with the size of the system was, to our 
knowledge, first made in E] (in continuous time). As observed 
in E) in the context of vehicular formation control, this means 
that any protocol which relies on consensus iterations can 
suffer from a considerable degradation of performance in large 
networks. Furthermore, showed that topology can have a 
profound influence on performance by proving that while on 
the ring graph the asymptotic disagreement grows linearly with 
the number of nodes, it remains bounded on the 3D torus (and 
grows only logarithmically in the number of nodes on the 2D 
torus). 

We next mention some other related strands of literature. 
The paper lf32l which investigated consensus-like protocols 
with noise in continuous time, focusing on connections with 
measures of sparsity such as number of spanning trees, number 
of cut-edges, and the degree-sequence. The related paper 
l33l investigated several measures of robustness related to 
equilibrium disagreement in terms of their convexity. The 
paper (25l characterized steady-state disagreement in a number 
of fractal graphs. The recent papers (39j, m considered 
the effects of noise in a continuous-time version of Eq. 
(jTJ over directed graphs. In |39l , explicit expressions for a 
measure of steady-state disagreement were computed for a 
number of such graphs. The paper j40] investigated steady- 
state disagreement on trees and derived a partial ordering 
capturing which trees have smaller steady-state disagreements. 
In (26], noisy consensus with a drift term was considered with 
a focus on ranking nodes in terms of their variance growth. Our 
earlier work mi focused on connections between asymptotic 
disagreement and the Cheeger constant and coefficients of 
ergodicity of the corresponding Markov chain. Moreover, we 
mention the recent paper l36l which considered approximation 
algorithms for the problem of designing networks that mini¬ 
mize equilibrium disagreement. Finally, there is considerable 
work on the leader selection problem, where only some of the 
nodes are performing consensus iteration, which has a similar 
flavor and which we do not survey here. 

B. Our contributions 

Under the assumption that the matrix P is reversible, 
we give an exact expression for a weighted version of the 
equilibrium disagreement where the disagreement at each node 
is weighted proportionally to its importance in the graph corre¬ 
sponding to the matrix P. Our expression is combinatorial in 
that it involves hitting times and the stationary distribution 
of P. Furthermore, we allow the noise w(t) to have any 
covariance matrix (though it must be i.i.d. in time). In the 
previous literature such expressions were available only for 
the special case when the matrix P was symmetric and all the 
noises Wi(t) were uncorrelated. 

This expression is the main result of this paper and it has 
three main consequences. First, our expression allows us to 


compute the weighted steady-state disagreement correspond¬ 
ing to simple averaging on undirected graphs, when each 
node puts an equal weight on all its neighbors. Updates of 
this form are the canonical example of distributed averaging 
algorithms. As a consequence, we are able to compute the 
weighted steady state disagreement for such updates on many 
graphs, ranging from simple examples such as the line graph 
and the star graph, to more sophisticated cases such as Erdos- 
Renyi random graphs and dense regular graphs. 

Secondly, our results lead to an explicit combinatorial 
expression (again in terms of hitting times and the stationary 
distribution of P ) which provides the best known approxi¬ 
mation for the unweighted steady-state disagreement (where 
the disagreement of each node is weighted equally). This 
improves on the results of (171 . which had the previously best 
combinatorial approximation (in terms of graph resistances) 
of unweighted disagreement. 

Thirdly, this generalization allows us to apply our results to 
the problem of formation control and characterize the noise 
resilience of a natural class of first-order protocols for it. 
It turns out that there is a very simple expression for the 
noise resilience of a symmetric formation control protocol: 
we show it is proportional to the so-called Kemeny constant 
of an underlying graph. This observation is new and allows 
for the easy computation of the scaling of noise resilience on 
a variety of graphs. 

Finally, we remark that our proof strategy is also of inde¬ 
pendent interest on its own. Most previous papers relied on 
explicit diagonalization of the system of Eq. 0- This can be 
made to work when the eigenvalues of P are known and P is 
symmetric (allowing us to change variables without affecting 
the covariance of the noise w(t)). However, this approach runs 
into obstacles when either of these assumptions is not satisfied. 
Here we introduce a different idea: we relate the recursions 
for steady-state covariance of Eq. 0 to recursions for hitting 
times on certain graphs. 

C. The organization of this paper 

The main result of this paper, namely an exact expression 
for weighted steady-state disagreement in noisy consensus, is 
presented in Section [H] as Theorem |T| Section [II] gives a proof 
of this result and then discusses simplifications in a number of 
special cases. Additionally bounds on unweighted steady-state 
disagreement are presented which improve on the current state 
of the art from El- 

Section [ffl] then uses Theorem [l] to work out how disagree¬ 
ment scales with the number of nodes for simple averaging 
on a number of common graphs. Section [IV] introduces the 
problem of understanding the performance of formation con¬ 
trol from noisy measurements of relative position, and, using 
Theorem [T] characterizes this in terms of the Kemeny constant 
of an underlying graph. Finally, simulations are provided 
in Section [V] and the paper finishes with some concluding 
remarks in Section M 

II. Asymptotic disagreement in noisy consensus 

In this section, we state and prove our main result, which is 
an explicit expression for the weighted steady state disagree- 
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ment in noisy consensus. Additionally, we work some simpli¬ 
fications of our result for the case when the matrix P from 
Eq. <|Tf is symmetric and discuss connections to resistance, 
Kemeny constant, and unweighted steady-state disagreement. 

We begin with a concise statement of our main result, 
starting with a number of definitions. For the remainder of this 
paper, we will assume P to be a stochastic, irreducible, and 
aperiodic matrix, and we let tt be the stationary distribution 
vector of the Markov chain with transition matrix P, i.e., 

n 

tt t P = tt t , y > = l . 

i=l 

Alternatively, n is simply the unique normalized left- 
eigenvector corresponding to the dominant eigenvalue of 1 of 
the stochastic matrix P. We note that, for the remainder paper, 
we will find it convenient to abuse notation by conflating the 
matrix P and the Markov chain whose transition matrix is P 
(for example, we will refer to n as the stationary distribution 
of P). 

We will use I)~ to stand for the diagonal matrix whose 
i’th diagonal entry is tt, . Furthermore, we define the weighted 
average vector, 

x(t) := TTjXi(t)^j 1, 

as well as the error vector 

e[t) := x(t) — x{t). 

Intuitively, e(f) measures how far away the vector x{t) is 
from consensus. Indeed, it is easy to see that the noiseless 
update x(t + 1) = Px{t) has the property that x(t) converges 
to (JA TtiXi(0)) 1. The quantity e(f) thus measures the dif¬ 
ference between the “current state” x{t ) and the limit of the 
noiseless version of Eq. |T|i starting from x{t). 

Our goal is to understand how big the error e(t) can get as 
t goes to infinity. Due to the addition of noise w(t) in Eq. (jTJ, 
the error vectors e(t) are random variables. Recall that w(t) is 
zero-mean i.i.d., and we now introduce the notation E„ ; for its 
covariance. In order to measure deviation from consensus, we 
will consider the following two linear combinations of squared 

n 

:= J 2 7 T iE[e 2 i(.t)\ 

i= 1 

= i- 

i—1 

i.e., we weigh the squared error at each node either propor¬ 
tionally to the stationary distribution of the node or uniformly. 
Finally, our actual measures of performance will be the 
asymptotic quantities 

S ss := lim sup S(t) 

t—f OO 

<5“ ni := lim sup 5 uni (t), 

t—f OO 

which capture the limiting disagreement among the nodes. We 
will refer to these quantities as weighted steady-state disagree¬ 
ment and unweighted steady-state disagreement, respectively. 


errors at each node, 

m 

6 uni (t) 


We will sometimes write <5 SS (P, E^,) when the update matrix 
P and the noise covariance T, w are not clear from context and 
likewise for 5“ s m . 

Before stating our main result, let us recall the notion of 
hitting time from node i to node j in a Markov chain: this 
is the expected time until the chain visits j for the first time 
starting from node i. We use Pm(* —► j) to denote this hitting 
time in the Markov chain whose probability transition matrix 
is M. By convention, Hm{i -> i) = 0 for all i. We will use 
the notation Hm to denote the matrix whose i, j’th element is 
Hm( f j)- For a comprehensive treatement of hitting times, 
the reader may consult the recent textbook 03. 

With the above definitions in place, our next theorem 
contains our main result on steady-state disagreement. We 
remind the reader that, in addition to the stated assumptions 
of the theorem, we are also assuming that P is a stochastic, 
irreducible, and aperiodic matrix for the remainder of the 
paper. 


Theorem 1. (An Explicit Expression for Weighted Steady- 
State Disagreement) If the Markov chain with transition matrix 
P is reversible, then 

5 SS (P,T, W ) = tr T H p - 2 D n E w D n l - Tr(H P 2 D 7r T, w D n ). 


The theorem characterizes S ss in terms of combinatorial 
quantities associated with an underlying Markov chain, namely 
the stationary distribution and the hitting times. Inspecting 
the theorem, we note that it expresses <5 SS in terms of a 
difference of two linear combinations of entries of the matrix 
H P 2 D n T, w D n , both with nonnegative coefficients which add 
up to n. 

Furthermore, this theorem captures the intuition that not 
all noises are created equal, in the sense that noise at key 
locations should have a higher contribution to the limiting 
disagreement. Indeed, in the event that noises at different 
nodes are uncorrelated, the second term of Theorem[l]is easily 
seen to be zero (since the matrix H P 2 has zero diagonal by 
definition) and we obtain 

n n 

5 SS (P,diag(oi,...,o£)) = TT jHp 2 (j -»■ i). 

*=1 i=i 

( 2 ) 

We see that in this case S ss is a linear combination of the 
variances at each node, where the variance of multiplied 
by tt'i TtjH P 2 (j —► i). Note that this multiplier is a 

product of a measure of importance coming from the stationary 
distribution (i.e., tt1) and a measure of the “mean accessibility” 
of a node (i.e., TVjH P 2 (j —► i)). 

In the event that all noises have the same variance, we obtain 
the simplified version 

n n 

7 Tj H P 2(j->i). (3) 

*=1 i=i 

As we discuss later in this paper, for many classes of matrices 
P the quantity n i 7r jHp 2 ti ~^ *) grows with the 
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total dimension of the system n. In other words, although the 
system is technically stable in the sense of having bounded 
expected disagreement as t -> oo, this stability is almost 
meaningless if n is large. Equations 0 and 0 allow us to 
determine when this is the case by analyzing how stationary 
distribution and hitting times grow on various kinds of graphffl 
Later in the paper (in Section III i we will use these equations 
to work out how <5 SS scales for a variety of matrices P which 
come from graphs. 


Proof. The first six equations are immediate consequences of 
the definitions of J, P, and n. The seventh equation can be 
established by induction. Indeed, the base case of k = 1 is 
trivial. If the identity is established for some k, then 


(pi _ j)fc+1 


(P l - J)(P l - J) k 
(. P l - J)(P lk - J) 

pl(k- (- 1 ) pl J — JP^ 

pl(k+l) _ j 


+ J 


A. Proof of Theorem [7] 

We now turn to the proof of Theorem [T] We begin with an 
informal sketch of the main idea. 

First, the disagreement 6“ s m may be thought of as propor¬ 
tional to the trace of a certain asymptotic covariance matrix 
E ss (to be formally defined later), whereas the weighted 
disagreement S ss may be thought of as the trace of D T E SS 
(since multiplication by the diagonal matrix D~ scales the 
Tth diagonal entry of covariance matrix E ss by tt,). Now the 
key observation is that we can write down a matrix E with 
the properties that (i) the trace of E is the same as the trace 
of A)_ E ss (ii) the difference between the matrix E and the 
matrix // /A_E w D n is a matrix with constant columns. These 
two properties allow us to relate 5 SS to the matrix HD 7T 'E w D n 
and very quickly lead to the proof of Theorem [T] 

The existence of such a matrix E is a technical observation 
and we are not aware of any intuitive explanation or justifi¬ 
cation for it. As a consequence, the proof given next is not 
very intuitive and largely composed of the manipulation of 
equations. 

Specifically, we begin by deriving a recursion for the error 
covariance matrix at time t and show that, after a large number 
of equation manipulations, in the limit as f —> oo it leads to 
a certain representation of D v E ss as an infinite sum; we then 
rearrange some parts of this sum to define the matrix E; finally, 
we show that the matrix E thus defined has the properties (i) 
and (ii) above and immediately deduce Theorem [l] 

We now begin the proof itself. The matrix J defined as 

J := l7T T , 


will be of central importance to the proof; the following lemma 
collects a number of its useful properties. 

Lemma 2 (Properties of the Matrix J). 


x(i) = 

J1 = 
JP = 
PJ = 
J 2 = 
(/-J) 2 = 
(. P l - J) k = 
P(P-J) < 


Jx(t ), 

1, 

J , 

J , 

J , 

I-J, 

P lk -J, 1 = 0,1, 2,..., and k = 1,2,... 
1 . 


'in particular, an implication is that the the amount of noise amplification 
in the network 5 ss (P,cr 2 I) is not fully characterized by the spectral gap of 
the underlying graph; see, for example, the table in Section [TTT] 


Note that some care is needed in applying the seventh equation 
as it is obviously false when k = 0. 

To prove the final inequality suppose that for some vector 
v £ C n and some A £ C, 

(P — J)v = Xv. 

If A 0, then 

n T v = 7 t t Pv = n T (P— J)v+n T Jv = \n T v+n T v = (l + A)-7r T v 

which implies that ir T v = 0. In turn, this implies that Jv = 0 
and consequently v is an eigenvector of P with eigenvalue A. 
By stochasticity of P, this implies |A| < 1. 

To show the strict inequality, observe that since the matrix 
P is irreducible and aperiodic, we have that it has only one 
eigenvector with an eigenvalue that has absolute value 1 and 
that is the all-ones vector 1. Thus if |A| = 1 then the vector v 
is a multiple of 1; however, 1 is an eigenvalue of P — J with 
eigenvalue zero which contradicts |A| = 1. We conclude that 
if A 0 then |A| < 1, which is what we needed to show. □ 


Next, we define the matrix 

m ■■= E[e(t)e{t) T ]. 

The following lemma derives a recursion satisfied by E(t). 
Lemma 3 (Simplified Recursion for the Covariance Matrix). 
E(t+1) = (P- J)E(t)(P- J) T + (I- J)E W (/- J) T . 
Proof. Indeed, using Lemma [2} 

e(t + 1) = x(t + 1) — Jx(t + 1) 

= Px(t) + w(t) — JPxft ) — Jw(t ) 

= (P — J)x(t) + (J — J)w(t) 

= (P — J)(x(t)—x(t)) + (I — J)w(t) 

= (P — J)e(t) + (/ — J)w(t), 

and therefore, 

E(f +1) = E\ e(t T l)e(£ + 1)^] 

= E[((P-J)e(t) 

+(/ - J)w(t)) (e(f) T (P - J) T + - J) T 

and finally since E[e(t)w(t) T ] = E{w{t)e{t) 1 ] = 0, this 
immediately implies the current lemma. □ 


Observe that 


<5 SS = lim sup y~Vj[E(f)]&. 
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As a consequence of Lemma [3j it is not hard to see that the 
initial condition cc(0) has no influence on S ss . Indeed, using 
£°(f) to denote what £(f) would be if x (0) = 0 we have that 

£(t) = So it) + (P - J)*e(0)e(0) T ((P - J) T Y . 

Since p(P — J) < 1 by Lemma[2j we see that £(£) —£°(f) —► 
0. Using 6° s to denote what <5 SS would be if x(0) = 0, we have 
that 

5 SS - <5° = lim sup (jTi [£°(f)]i? + 7t i[E(t) - £° (*)]«) 

£—>■ OO 

-lim sup Tri[Y, 0 (t)]ii 

£—>•00 

= 0. 

Thus for the remainder of this paper, we will make the 
assumption that x(0) = 0, i.e., that the initial condition is 
the origin. This assumption will slightly simplify some of the 
expressions which follow. 

In our next corollary, we write down an explicit expression 
for £(f) as an infinite sum. 

Corollary 4 (Explicit Expression for the Covariance Matrix). 
For t > 1, 

t -1 

£(£) = J2( pk ~ J)Z w ((P T ) k - J T )■ 

k—0 

Proof. Indeed, as we are now assuming that x(0) = 0, Lemma 
[3] implies that for t > 1, 

t-1 

£(t) = J2( p - j) k (l - J)Z W (I - J) T (P T - j T ) k 

k =0 

= (/ - J)DE W {I - J T ) 

£-1 

+ Y,( pk - J )( / - - J) T ((P T ) k - J T ) ( 4 ) 

k =1 

where the last line used Lemma [5] for the equality (P — J) k = 
P k — J when k > 1. 

Next, observing that by Lemma [2] again if k > 1, 

(. P k - J)J = {P~ J) k J = (P ~ J) k ~\P - J) J = 0 

and therefore if k > 1, 

(p k -j)f-j)j: w (i-J) T ((P T ) k -J T ) = (P fe -J)s u ,((p T ) fe -j T ). 

Plugging this into Eq. <|4}, we obtain the statement of the 
corollary. P 

Appealing once again to Lemma [2] we may rewrite the 
previous corollary as 

t -1 

£(£) = (I - J)£ W (J - J) T + - J) k ^A( p - J) T ) k - 

k =1 

Furthermore, by Lemma [2] the matrix P—J has spectral radius 
strictly less than 1. It follows that we can define 

OO 

£ ss := (J-J)£ W (J-J) T +^(P-J) fc £ w ((P-J) r ) fe , (5) 

fc=l 


and this is a valid definition since the the sum on the right- 
hand side converges. Moreover, 

£ ss = lim £(£). 

£—>•00 

Our next step is to observe that if we define D n := 
diag(7Ti, 7T2,..., 7r n ), then the quantity S ss we are seeking to 
characterize can be written as 

S ss = Tr(£ ss P w ). (6) 

We therefore now turn our attention to the matrix £ SS P T . 
Our next lemma derives an explicit expression for this matrix 
as an infinite sum. The proof of this lemma is the only place 
in the proof of Theorem [T] where we use the reversibility of 
the matrix P. 

Lemma 5 (Explicit Expression for the Weighted Covariance 
Matrix). 

OO 

£ssAr = (I - J)£wP.(/ - J) + Y,( p - J) k ^D*{ p - J) k ■ 

k =1 

Proof Indeed, from Eq. 0, 

OO 

£ SS P W = (/-J)£ w (/-J) T P 7r +^(P-J) fc £ w (P T -J T ) fc P w 

fc =i 

(7) 

Now the reversibility of P means that for all i,j = 1 ,n, 
we have that 7r,;P,j = TtjP ?l . We can write this in matrix form 
as 

D n P = P t D„. 

One can also verify directly from the definitions of ,J and D n 
that 

D^J = J T D n . 

Plugging the last two equations into Eq. (|7]i, we obtain the 
statement of the lemma. □ 

Definition of the matrix £: we would now like to introduce 
the matrix £ defined as 

OO 

£ := ^(P 2fc - J)£ W D W . (8) 

k—0 

As before, by Lemma [2] we have that p(P — J) < 1, and 
consequently the sum on the right hand side converges and 
£ is well defined. Furthermore, since Tr(AP) = Tr (BA), 
Lemma [5] immediately implies that 

Tr(£) = Tr(£ ss O T ), 

and putting this together with Eq. (|6j, we have 

Tr(£) = S ss . (9) 

Furthermore, since by Lemma |5]we have that J(P k — J) = 0 
for all k > 0, we have that 


J£ = 0. 


( 10 ) 
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Finally, using Eq. ©■ followed by Eq. (|8]i and Lemma [2] we a matrix with constant columns. In other words, there exists a 
have the following sequence of equations: vector v such that 


P 2 E 


(P 2 - J)E 

oo 

Y^iP 2 - J)(P 2k - J)E w D n 

k =0 

oo 

(p 2 - j)(i - j)s w p, + Y ( p2 - J )( p2 

k=1 


E = —HD^Tj^D 


7T ^-/ W J -'7T 

T 


lv 1 


(14) 


We can, in fact, compute ltr exactly by utilizing Eq. (10 1 , 
which implies that 


J) fe S w P 7r 


lTt T HDlD „2 = lv T . 


= (P 2 - J)E W P, + Y(P 2(k+1) - J)E W P„ 

k =1 
oo 

= Y(P 2(k+1) - l)SwAr 

fc =0 

oo 

= Xl( p2fc “ J ) S wC. 

fc=l 

= E - (7 - J)E W P W 
which we may rearrange as 

E = P 2 E + (I - J)E W D„ (11) 

With these identities in place, we are finally ready to prove 
Theorem Q] 


Plugging this this into Eq. ( [T4| , we obtain 

E = — HD 7T Y]- w D- k + l7r^ 1 77P 7r E w Z7 7r . (15) 

Finally recalling that i5 ss is the trace of E (see Eq. ^), 

5 SS = -Tr(HD n T, w D v ) +-k t HD n 'E w n. 

□ 

Having proven Theorem [T] we conclude the section with 
a discussion of its simplifications in the case when P is 
symmetric, followed by an enumeration of some connections 
it implies between the weighted steady-state disagreement 
<5 SS , the unweighted steady-state disagreement d" s nl , and other 
graph-theoretic quantities such as the electrical resistance and 
the Kemeny constant. 


Proof of Theorem [7] Let us stack up the hitting times in the 
Markov chain which moves according to P 2 in the matrix H, 
i.e., Hij := H/«(i — ► j). By conditioning on what happens 
after a single step, we have the usual identity 

n 

= 1 + y ^[P 2 ]jkHkj, i ± 3- 

k =1 

On the other hand, since a random walk spends an expected 
I/ 7 Tj steps in between visits to node i, 

n 

Hu. =0 = 1 + Y\[P~]ikHki -■ 

We can the previous two equations in matrix form together as 
H = 11 T + P 2 H - Df 1 , 
or 

(I ~ P 2 )H = 11 T - Df 1 . 


Multiplying both sides of this equation by D % E w I)- on the 
right, we obtain 


(/ - P 2 )HD 2 D a 2 = (J - /)E W B, 


( 12 ) 


On the other hand, observe that we may rearrange Eq. (Ill as 
(7-P 2 )E = (7-J)E W D W . (13) 


Adding Eq. ( [T2| and Eq. ( |~i~3j ), we obtain 

(J - P 2 ) (e + HD^d3) = 0 , 

meaning that all the columns E + H l) 7T Y. w l) 7T lie in the null 
space of I — P 2 . But because P is irreducible and aperiodic, 
the null space of I — P 2 is span{l}. Thus T, + HD v 'E v ,D n is 


B. Simplifications of Theorem [7] in the symmetric case 

In this subsection we collect several simplifications and 
observations that pertain to symmetric matrices P. Thus for 
the remainder of this Section II-B we will asume that P is a 
symmetric matrix. Some of the identities we derive in this brief 
subsection will be new, whereas others will be simple proofs 
of already known results. The main reason these results are 
collected here is that we will need to use some of them in 
Sections HU and |Tv] 

Since the symmetry of P implies that ir = (l/n)l, we 
immediately obtain that 


1 


1 


S ss (P, EJ = — l 1 HT, W 1 - -Tr(HE w ). (16) 


Using the notation E.^ = [aif as well as the fact that S^, is 
symmetric, we may expand this expression to obtain 


II, I L II, 

6 SS (P, E™) = ^3 EEE Hp2(k — > l)(Jii 

i= 1 k =1 1=1 

- ^2 E°7t j) + H P 2 U -A *)) ( 17 ) 

i<j 


It is also worthwhile to rewrite this as 

S„(P,E m ) = (Tr(JTE„,ll T ) — Tr(niTE w )) 

= --^Tr(HZ w (nI-ll T )) 
n° 

= -3-Tr(HZ w (I-(l/n)ll T )) 
n z 


= —^Tr(i7S ll) P 1 i.) (18) 

where P x x = I — (l/ri)ll 7 is the orthogonal projection 
matrix onto the subspace l 7 -. 
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Equations ©■ ( fTTj) , and ( |~i~8j ) are considerable simplifi¬ 
cations of Theorem IT] However, it is possible to simplify 
Theorem [l] still further if we additionally assume that T, w 
is diagonal, i.e., T, w = Diag (of,... , 0 ^). In that case, the 
second term on the right of Eq. © is zero and we obtain 


4s (-P, diag(<7i 


2\\ 1 &fH P 2 (k 

i—1 k =1 


*) 


(19) 


Finally, let us assume that the the variances are all identical, 
i.e., D w = cr 2 I. In this case the answer can be written in a 
particularly simple form in terms of the so-called Kemeny 
constant. 


Kemeny constant. A classic result of Kemeny sometimes 
called the “random target lemma” shows that the quantity 
X^j=i (* j) is independent of i for any Markov 

chain M. The quantity E^ =1 7 t 7 T/m(* —> j) is thus called 
the Kemeny constant of the Markov chain and we will denote 
it by K(M). 

With this in mind, from Eq. © we have that 

5 ss (P,a 2 I) = a 2 :EE 2 (20) 

n 

Arguably, this is the simplest possible characterization of 
(5 SS for symmetric matrices P and E w = cr 2 1. 

Moreover, we remark that this can be rewritten in terms of 
the eigenvalues of the matrix P. Indeed, defining A (M) to be 
the set of all non-principal eigenvalues of M, it is known EE), 
fl2l that 

K(M)= Y, Y~X' (21) 

AeA(M) 


Putting the last two equations together, we have that for 
symmetric P with constant variances, 


5 ss (P,a 2 I) 



AeA(p) 


1 


This last identity is not a new result; rather, it was first 
observed in 1381 where it was proved directly by diagonalizing 

P. 


Electrical resistance. We remark that it is possible to use 
Theorem [T| obtain a characterization of S ss (P,a 2 I) in terms 
of electric resistances as first shown in HD (see also li25l for 
the analogous observation in continuous time). 

Given a reversible stochastic matrix M £ R nxn with zero 
diagonal, we define 

Qm{x,u) ■= TT x M(x,y). 


Using this identity along with the symmetry of the matrix 
P (which implies all n j equal 1 /n)), we can group terms 
together in Eq. © to obtain 


6(P,a 2 I) = 


o 2 Ei<j Pp 2 {i ^ j) 


n n* 

As mentioned above, this identity was first proved in ma. 


C. Further connections to resistance, the Kemeny constant, 
and unweighted steady-state disagreement. 

We now turn our attention back to the case when P is 
reversible (and not necessarily symmetric). In this subsection, 
we derive a number of inequalities bounding S ss in terms of the 
largest resistance and the Kemeny constant. We also discuss 
how we can bound A”" in terms of <5 SS . All the inequalities 
derived within this subsection are new. 

By putting Theorem [T] together with Eq. ( |26| , we obtain 

n n 

Sse(P, diag(cr?, ..., al)) = ^i^hjH P 2 (i ->• j) 

i =1 j=l 

maxi?p2(f ++ j) 

In other words, S ss may be quickly bounded in terms of the 
largest variance, stationary distribution, and resistance. We can 
also obtain a lower bound in terms of the smallest versions of 
similar quantities. Indeed: 


max <7; TVi 

= 1 ,...,71 


<5 as (P,diag(o-i, .. .,al)) = Y Y ->• j) 

i =1 3=1 


> 


( min 

V i=l,...,n 


2 

7 V i 


EE- ^H P 2(j 


i=l j =1 


( min 


2 

<Zi 


Pi 


K(P 2 ). 


These inequalities can be used to obtain quick bounds on S ss 
when either the resistance of the Kemeny constant are known. 


Bounding d" s nl . The problem of giving a combinatorial char¬ 
acterization of (5“ s nl (P, T, w ) for reversible P is open, to the best 
of our knowledge. Here we provide combinatorial lower and 
upper bounds on A,”" which are tighter than the best previously 
known bounds. 

Indeed, observe that 

n n 

4 S it) = E—^iW] = - E mr i- B [ e i (*)]> 


Note that reversibility of M implies that y) = > %)- 

The quantity Rm{o- b) is defined to be the resistance from a 
to b in the electrical network where the edge (i,j) is replaced 
with a resistor with resistance l/qM(i,j)- 

There is a connection between resistances, thus defined, and 
hitting times: 

Hm(i —> j) + HmU —>■*) = Pm(* •H- j). (22) 

A proof may be found in Section 10.3 of 03- 


wTmhATW < 4 s (t) < mr max 6s S nl (t), 

which implies 

4s ^ ^uni ^ 4s 

^max 

Thus as a consequence Eq. (|2j, we have 

. n n 

-E E — — 2 H P 2 C? *) ^ 4T 1 (P, diag(aj ,...,cr 2 )) 











and 


. n n 

-V V aiiTinjH P 2 (j ->•*)> S^'(P, diagO?,..., cr*)). 

i=ij=i 

This pair of bounds may be viewed as an improvement on the 
results of 03. That paper provided upper and lower bounds 
on 5™ 1 in terms of the stationary distribution and the electrical 
resistance; the ratio of the upper and lower bounds given was 
(^max/Ttmin) 4 - By contrast, the ratio of the upper and lower 
bounds in the two equations above is 7r max /7r m i n . 


III. Examples 

The goal of this section is to demonstrate that “back of the 
envelope” calculations based on Theorem [I] can often be used 
to give order-optimal estimates of 5 SS . Indeed, we will obtain 
estimates of how 6 SS scales with the number of nodes on many 
common graphs. The interested reader may skip ahead to the 
table at the end of this section. 

We begin by describing a natural way in which a stochastic 
matrix can be chosen from a graph. Given an undirected 
connected graph G = ({1,..., n}, E) without self-loops, let 
d(i) denote the degree of node i, and let us define 

p U/d(i) (ij)€E, 

3 |o else. 

Clearly, P is a stochastic matrix. However, if the graph G is 
bipartite the quantity 6 SS (P , diag(aj,..., <r^)) will be infinite 
if at least one of of is strictly positiv^] An easy fix for this 
is to consider instead 

P =\ I +\P- (24) 

Intuitively, each agent will place half of its weight on itself and 
distribute half uniformly among neighboring agents. It is tau¬ 
tological that if G is connected then P is irreducible. Finally, 
observe that P constructed this way is always reversible. 

After attending to some preliminary remarks in the next 
subsection, we proceed to give order-optimal estimates of the 
quantity S ss (P, diag(of,..., of)) for a number of matrices P 
constructed from graphs in this way. 

Preliminary observations. 

• We note that it is quite easy to compute the stationary 
distribution of a matrix defined from an undirected graph 

2 We relegate the justification of this assertion to a footnote. Indeed, suppose 
that the graph G is bipartite and let Vi U Vo = {1,..., n} be a bipartition. 
Then the vector v defined as = d(i),i £ V\ and v, = £ V 2 is a 

left-eigenvector of P with eigenvalue — 1 . Observe that v T l = 0 since both 
JAgVi d(i) and ^(®) count the number of edges going between Vj 

and V 2 . Thus 

v T e(t + 1) = v T x(t + 1) = —v T x(t) + v T w(t) = —v T e(t) + v T w(t). 
Letting y(t ) = (—l) t u T e(f) this becomes 

y(t. + 1) = y(t) + (~l) t+1 v T w(t). 

Since :r(0) = 0 we have E[v T e(t)\ = 0 and E[y(t)] = 0. Thus as long 
as at least one cr? is strictly positive, we have that Var (y(t)) —> +oo and 
consequently Var(t; T e(t)) —> +oo. Since all 7are strictly positive due to 
the connectivity of G, it is not too hard to see that this implies that S ss is 
infinite. 


according to Eq. ( [23][24| . Indeed, letting m be the number 
of edges in the graph G which are not self loops, it is 
easy to verify that 7 r, = d(i)/(2m). Naturally, this is also 
the stationary distribution of P 2 and P. 

• We remind the reader that for two functions /. g : X -A 
R, the notation f(x) = Q(g(x)) means that there exist 
positive numbers c,C such that cg(x) < f{x) < Cg(x). 
We will sometimes write this as f(x) ~ g(x). 

• Observe that on conntected graphs where the total number 
of edges is linear in n, we have that 7ri ~ 1/n for all i 
and consequently <5" s r " ~ S ss . 

• Let us adopt the notation 'H m for the largest hitting time 

in the chain which moves according to the stochastic 
matrix M, i.e., Hm = maXjj —> j)- Then we 

have the following lemma. 

Lemma 6. If M is diagonally dominant, then 

Hm 2 = O (%). 


Although this statement is elementary, we provide a proof 
for completeness. 


Proof. Consider any pair of nodes i, j and let Tm{i —> j) 
be the first time a random walk starting at i and moving 
according to M hits j, i.e., Tm{i —> j) is the random 
variable whose expectation is Hm{i —> j). Then, as a 
consequence of the diagonal dominance of M, we have 
that for any time t, 


P(T M 2(i —>• j ) <t)> 


P{T M {i -> j) < 2 1) 

2 


(25) 


This is true because: 


- The probability of the event {^(i —>• j) < f:} 
equals the probability that a random walk starting at 
i and moving according to M hits j by time 2 1 at 
an even time step. 

- Consider a sample path in the chain moving accord¬ 
ing to M which starts at i and ends when it hits 
j, which happens by time 2 1. Either (a) this sample 
path hits j at an even time step (b) this sample path 
can be extended with a self-loop to further hit j at 
an even time step by time 2t (and the probability 
of taking that self-loop is at least 1/2 by diagonal 
dominance). 

We next plug t = 'H m into Eq. © to obtain 


P(T M 2 {i j ) < Pm) > 


P(T M (i —» j ) < 2PLm) 
2 



where the last step used Markov’s inequality. Since this 
did not depend on the starting point i, we can iterate this 
argument to obtain that E[T M 2 (i j )] < 4Hm, which 
is what we needed to show. □ 


• As a consequence of the last bullet as well as the fact 
that Hp{i —> j) = 2Hp(i —> j) is that P P 2 ~ ftp. 

• A convenient tool to compute upper bounds on hitting 
times in P is their connection to electric resistances. We 
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refer the reader back to Section III-BI for the definition of 
electric resistance I?m(* <->• j) and here merely recall the 
identity 


H M (i —► j) + Hm( j —>■*) — Rm{ i ^ j) 


(26) 


For the matrix P defined in Eq. (24» we have that for 
every pair of neighbors x, y. 


qp(x,y) 


1 d(x) 
d(x) 2m 


1 

2 m ’ 


where recall m is the number of edges in the graph G. 
Consequently, the resistance Rp(i -f-> j) can be obtained 
as electrical resistance between i and j in a graph where 
every edge has resistance 2to. 


With these preliminary remarks in place, we now turn to 
the problem of computing 5 SS for matrices which come from 


graphs according to Eq. (|23 24 1 . We will be assuming that 
12 w = diag(crj,..., On) for the remainder of this section (and 
in places we will even consider the case when all of are equal 
to the same o 2 ). As we will see next, we can use Theorem 
[I] as well as the above preliminary observations to estimate 
S ss to within a constant multiplicative factor for a number of 
common graphs. 


The complete graph. By symmetry iri = 1/n for all nodes. 
Moreover, for every pair i,j such that i j, Hp 2 (j —► i) ~ n. 
Thus by Eq. 0> 


5 SS = ]T of 


4 E i e( „ )= skfi. 

n z ~ J . n n 




This fact can also be obtained by an easy calculation directly 
from the definition of S ss . 


The circle graph. Once again, by symmetry we have that tt, = 
1/n for all nodes. An additional consequence of symmetry is 


that Hp(j —► i) = Hp(i —>• j), and so by Eq. (261 both of 
these quantities equal half of the resistance between nodes i 
and j. That resistance can be computed by taking two parallel 
paths, one with length |j - i\ and the other with length n — 
\j — i\; each edge of the path has resistance 0(n). In the worst 
case, the resitance is quadratic, meaning that we can bound 
'H P 2 = 0(n 2 ). Thus by Eq. 0, 


<5ss = 


Wj_y-I 0(r 

n 2 ^ n 

z=l j^i 

= o(±A. 


The line graph. On the line graph, we have that the corner 
nodes have stationary distributions which are tt 1 ~ 1/n. By a 
standard “gambler’s ruin” type argument, we have that ftp = 
0(n 2 ). Thus the calculation is the same as for the ring graph, 
i.e., 

4s = O of ^ 

We remark that <5“ s m has the same scaling, as a consequence 
of the fact that 7r* ~ 1/n for all i. 


The star graph. Let us adopt the convention that node 1 is 
the center of the star and nodes 2,..., n are the leafs. We then 
have that 7Ti ~ 1 and 7 t^ ~ 1/n for i = 2,... ,n. Furthermore, 
H P 2 (i —» 1) ~ 1 for i = 2,..., n while H P 2(1 —> i) ~ n 
and H P 2 (j —► i) ~ n for all i,j with i ^ j ^ ^ ~f~ j ’ 

Consequently, 




3& 


i =2 


1 ■ n + 


E 


k= 2,. 


k^i 


2 , a 2 H-1- a n 

- - • 

n 

As might be expected, noise at the center vertex contributes 
an order-of-magnitude more to d ss than noise at a leaf vertex 
with the same variance. We also remark that t)’"" is upper by 
the above scaling since the total number of edges is linear. 


The two-star graph. Consider two stars joined by a link 
connecting their centers. It is not hard to see that all hitting 
times in P 2 are O(n), with the exception of hitting times from 
a leaf to its own center, which are 0(1) as before. Adopting 
the conventions of having node 1 and node n denote the two 
centers, we have that 

7ti = 7t n — 1, 7 Tk — for all k 7 ^ 1, n. 

n 

Thus 


5 SS ~ (of + of )(1 • n + n— ■ 1 + n-n) 

n n 




i =2 


- n (< 7 l + °n) + 



n 

It is interesting to compare our results for the star graph 
with our results for the two-star graph. While on the star graph, 
noise at the center vertex contributes 0(n) times more to the 
limiting disagreement than noise at a leaf vertex, on the two- 
star graph the corresponding factor is 0 (n 2 ). Furthermore, if 
all of are positive and bounded away from zero independently 
of n, the disagreement on the two-star graph is 0(n) while 
disagreement on the star graph is 0(1). One implication of 
these comparisons is that the diameter of the graph (which 
is constant for both the star and the two-star graph) does not 
determine the order of magnitude of 5 SS . 

Finally, we also remark that d") 11 is upper by the above 
scaling since the total number of edges is linear. 


The starry line graph. We now describe a graph on which 
5 SS scales quadratically in the number of nodes n when of = 
o 2 for all i - an order of magnitude worse than on all the 
examples we have considered hitherto. We have not seen this 
graph described in the literature and we call it the starry line 
graph. 

The construction of the graph is simple. We take a line 
graph on n/3 nodes and two star graphs on n /3 nodes (let 
us assume n is divisible by 3). We join these three graphs 
together as follows: we put an edge between the center of the 
first star and the left-most vertex of the line and put an edge 
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between the center of the second star and the right-most vertex 
of the line. 

We first argue that S ss scales at least quadratically on this 
graph. Indeed, let node 1 be the center of the first star and 
let node n be the center of the second star. Considering 
resistances and using Eq. ( p6| ), we immediately see that 
Hp( 1 —> n) + Hp(n —> 1) = 0(n 2 ). By symmetry, this 
implies that both Hp( 1 —> n) and Hp(n —> 1) are fl(n 2 ). 
Since H M 2 (i —► j) = 0 ,(Hm(^ — > j )), this implies that 
both Hp 2 (1 —)• n),H P 2 (n —> 1) are also Q(n 2 ). Since 
the stationary distribution at both nodes 1 and n is lower 
bounded independently of n, we immediately obtain that 
Sss > cr 2 7r 2 7r„f?p2 (n — > 1) = <r 2 n(n 2 ). 

To get that S ss — n 2 , we argue that the contributions from all 
other pairs of nodes i. j in Eq. (|3]> is not more than a 2 0(n 2 ). 
We will use the bound Hp 2 ( j —>•*) = 0(n 2 ) for all i,j, which 
follows from Tip = 0(n 2 ) from resistance analysis. Indeed, 
if neither of i, j is 1 or n, the assertion we need follows since 
there are 0(n 2 ) such pairs, all with 7r 2 7 tj = 0(l/n 3 ), so their 
contribution is 0(n 2 n 3 (l/n 3 )) = 0(n 2 ). For pairs i,j when 
one of i , j is 1 or n, we have that there are 0(n) such pairs 
with 7 r 2 TTj = 0(l/n), so their contribution is 0(n(l/n)n 2 ) = 
0(n 2 ). This concludes the argument. 


The two-dimensional grid. Let us assume that n is a perfect 
square. The two-dimensional grid is the graph with the vertex 
set {( i,j ) | i = 1 ,...,y/n,j = 1 ,...,y/n}, and the edge 
set which is specified by the rule that and (i 2 ,ja) are 

connected if and only if \i\ — * 2 1 +1Ji — J 2 I = 1- In other words, 
each node of the 2D grid is labeled by an integer point in the 
plane, with edges running left, right, up, and down between 
neighboring points. 

By utilizing the formula 7Tj = d(i)/m, we immediately 
have that 7 r,; ~ 1 /n for all nodes. A standard argument 
(see Theorem 6.1 of El) shows that, with unit resistances 
on each edge, the largest resistance in the two-dimensional 
grid is O(logn). This means that using Eq. (26 1 to bound 
the commute time (which, recall, involves putting a resistor of 
resistance 2 m = 0(n) on every edge) we obtain that, 


Tip = 0(nlogn), 

and consequently the same bound holds for 'Hp 2 . This implies 
that 

^-nlogn 



Finally, note that since the degrees on this graph are all 0(1), 
it follows that S ss and <5“ s m are within a constant factor of each 
other, and consequently <5" h ,n satisfies the same scaling. 

The d-dimensional grid with d > 3. We may define the d- 
dimensional grid analogously by associating the nodes with 
integer points in W 1 and connecting neighbors. According to 
Theorem 6.1 of fH, the largest resistance between any two 
nodes in a d-dimensional grid with unit resistors on edges 
is 0(l/d). This becomes 0(n) when we put resistors of 


Sss = 


'cr 2 (n- 1)0 


resistance 2m = 0(nd) on each each edge. An implication 
is that Up = 0(n). Since all degrees are within a factor of 2 
of each other, we also have that l/(2n) < 7r^ < 2/n for all 
nodes i. Putting this together gives 



Finally, for the same reason as on the 2D grid, <5“ s m satisfies 
the same scaling. 


The complete binary tree. It is shown in Section 11.3.1 of 
m that for the complete binary tree on n nodes, 'H j, = 
0{n log n). Since all degrees are within a factor of 2 of each 
other, we have 71 \ ~ 1 /r?. for all nodes. We thus immediately 
have the same estimate as for the 2D grid, namely 



Again since all degrees are within a factor of 2 of each other, 
<5“ s nl satisfies the same scaling. 


Regular expander graphs. We first give (one of the) stan¬ 
dard definitions of an expander graph. Given a graph G = 
({1,..., n}, E) and a subset V' C {1,..., n) we introduce 
the notation N{V') to denote the set of neighbors of nodes 
in V' , i.e., N(V ) = {j \ {i,j) £ E for some i £ V '}. The 
graph G is called a a-expander if for every V' C {1,... ,n} 
with \V'\ < n/2 we have | N(V') — V'\ > a'|V 7 |. 

It is Theorem 5.2 in 0 that a regular connected a-expander 
with degree d has resistance at most 0(1/(a 2 d)) with unit 
resistors on edges. As a consequence, all commute times in P 
are bounded by 0((l/(a 2 d)) ■ dn ) = 0(n/a 2 ) so that 


Sss = 


I >. 2 




E n 

i— 


— o (- 2 


Since the graph is regular, S ss = <5™ in this case. 


Dense Erdos-Renyi random graphs. We next argue that 

n 2 \ 

) , (27) 

n J 

on an Erdos-Renyi random graph with high probability 
subject to assumptions we will spell out shortly. Note that 
in order to obtain such a result, wewill use the fact that all 
stationary distribution entries are ~ 1/n in magnitude and all 
hitting times are linear. The latter result is apparently available 
in the literature in lfl6ll only for dense Erdos-Renyi random 
graphs. 

More formally, we consider an undirected Erdos-Renyi 
random graph on n nodes, meaning that each edge appears 
independently with a probability of p n . Under the assumption 
that (logn) e ^ ogl ° sn ^/(np ra ) —y 0 as n —> oo (this means 
that the total number of edges in the random graph has 



3 A statement is said to hold with high probability if the probability that it 
does not hold approaches zero as n —> oo. 
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expectation that grows slightly faster than n, namely faster 
than n(logn) loglog "), it follows from the results of lfl6l that 
there exists constants c, C such that with high probability we 
have that for all i, 


n 

an < E] 7 TjHp(j —>■*)< Cn. 
3=1 


Thus K(P) ~ n, and therefore K(P) — n. Since diagonal 
dominance of P implies its eigenvalues are nonnegative via 
Gershgorin circles, we have that K(P 2 ) < K{P ) by Eq. 
(21 1 , and we finally obtain that K(P 2 ) = O(n) with high 
probability. Finally, since -ny = d(i)/2m it is quite easy to 
see that all 7r are on the order of 1/n with high probability; 
formally, we refer the reader to Lemma 3.2 of of US). We 
thus have 


6 


SS 



O(n) = O 



Finally, <5" h nl follows the same scaling under these assumptions 
since with high probability all tti are on the order of 1/n. 


Graph 

4s 

Complete 

^(E”=i °t)/n 

Line 

m sh] 

Ring 

o(e;=i^) 

Star 

- <r'i + 0 -M E " = 2 vf 

Two-star 

^n(a 2 + a 2 n ) + (l/n)j:^a 2 

Starry line graph 

~ a 2 n 2 when a 2 = a. 

2D grid 

(£'U^)0((logn)/n) 

kD grid with k > 3 

o(ru°t)/n 

Complete binary 
tree 

(£:=i^ 2 )0((logn)/n) 

Regular a-expander 
graphs 

0{ 1/a 2 ) • (£r=i°f)/ n 

Dense Erdos-Renyi 
random graphs 

o(£r=i^ 2 )/« 

Regular dense 
graphs 

0 (£: =1 ^ 2 )/n 

Regular graphs 

o(e;=i^) 


IV. Formation control from noisy relative 

POSITION MEASUREMENTS 


Regular dense graphs. Let G be a regular graph with degree 
d > |n/2j. Then it is Theorem 3.3 in (4) that the largest 
resistance in such a graph graph with unit resistances on the 
edges is 0(l/n). It we put a resistor of size 2 m = 0(nd ) on 
each edge, the largest resistance becomes 0{d). We thus have 

n n 

n 6 \ n 

t=l 3 =1 V 


Once again, because on a regular graph 4s = r)//", we have 
that the same asymptotic holds for (5“ s nl . 

Regular graphs. We now argue that on any regular graph, 
4 S = O (a 2 + • • • + a 2 ). In particular, this implies that the 
ring graph achieves the worst possible scaling for any regular 
graph. At first glance, this might not sound surprising since the 
ring graph is the sparsest connected regular graph; however, 
looking at the table at the end of this subsection, we see that 
there is no clear connection between 5 SS and sparsity. 

This fact is an immediate consequence of the main result 
of 0, which implies that in a regular graph Tip = 0(n 2 ). 
Since 7iy = 1/n for all i due to regularity, we have that 


4s = E E 

i= 1 3=1 


s o(„ 2 ) = o 


Moreover, on a regular graph we have that 4s = 4s"’ so that 
c)// 1 satisfies the same upper bound. 

Summary. We provide a table to summarize all the bounds for 
4 S on concrete graphs obtained in the preceeding subsections. 


In this section we consider the problem of formation control 
from noisy relative position measurements, i.e., when each 
node can measure the (noisy) position of neighboring nodes 
relative to itself. We will show that, using Theorem [T| we can 
characterize the long-term performance of a class of natural 
protocols in this settings in terms of the Kemeny constant of 
an underlying graph. 

We begin with a formal statement of the problem. Our ex¬ 
position here closely parallels our earlier works (23) . (24) ■ We 
consider n nodes which start at arbitrary positions pj(0) £ R d . 
As in the previous sections, there is a graph (V, E), and now 
the goal of the nodes is to move into a formation which is 
characterized by certain desired differences along the edges 
of this graph. 

Formally we associate with each edge (i, j) £ E a vector 
Vij £ R d known to both nodes i and j. A collection of points 
Pi,..., p„ in R d are said to be “in formation” if for all (i, j) £ 
E we have that p j — p, = r, r In the current section (i.e., in 
Section H3- we will be assuming that G is a directed graph 
with the “bidirectionality” property that (i, j) £ E implies 
(j,i) £ E\ we will do this so that we may refer to (i,j) and 
(J, i ) as distinct edges of the graph. Naturally, we will also 
assume G is strongly connected. 

Note that, given the vectors r,j, there may not exist a 
collection of points in formation; that is, some collections of 
vectors {r.y, ( i,j ) £ E} may be thought of as “inconsistent.” 
For example, unless r, ? = —Tp for all (i-j) € E the collection 
{r ij,(i,j) £ E} will clearly be inconsistent. Moreover, 
since the property of being in formation is defined through 
differences of position, any translate of a collection of points 
in formation is itself in formation. 

We thus consider the following problem: a collection of 
nodes would like to repeatedly update their positions so 
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Fig. 1. The offsets shown on the left side of the figure define a “ring formation” with 4 nodes. On the right, we show the result of simulating Eq. (30| on 
this graph with all the weights fij equal to 1/9 starting from random positions. We see that the nodes begin by moving close to the formation and spend the 
remainder of the time doing essentially a random walk in a neighborhood of the formation. 


that pi, p„(£) approaches some collection of points in 
formation. We assume that node i knows p j(t) — p,(f) for 
all of its neighbors j at every time step £ and furthermore we 
assume a “first-order” model in which each node can update 
its positions from step to step. The protocols we derive for 
this problem will not assume the presence of a centralized 
coordinate system common to all the nodes. 

A considerable literature has emerged in the past decade 
spanning many variants of the formation control problem. We 
make no attempt to survey the vast number of papers that have 
been published on the topic and refer the interested reader to 
the surveys (27), (28) , ED. We stress that the problem setup 
we have just described is only one possible way to approach 
the formation control problem; a popular and complementary 
approach is to consider formations defined by distances | |pj — 
Pi11 2 rather than the relative positions pj — pi (see e.g., (7) , 
(22), ED, G3)). In terms of the existing literature, our problem 
setup here is closest to some of the models considered in tm 

(HI. ED. 0, ED- 

A natural idea is for the nodes to do gradient descent on 
the potential function Y^(ij)£E l|Pj — Pi — Tyll!- This leads 
to the update rule 

Pi(f-l-l) = pi(i)+ fij(pj(t) - pi(f) - rij), (28) 

jeN(i) 

where {fij} are positive numbers that, for technical reasons, 
need to satisfy the step-size condition 5ZjeJV(») /if < 1 for all 

i. 

Note that this update may be implemented in a completely 
decentralized way as long as node i knows the differences 
Py(£) — Pi(t) and the desired relative positions r, :i . Indeed, 
the above update allows node i to translate knowledge of the 
differences p_,(£) — Pi(t), which can be measured directly, into 
knowledge of the difference p,(£ + 1) — Pi(t), which in turn 
be used to update the current position. In other words, this 
update may be executed without node i ever knowing what 
the actual position p;(f) is. 

It is easy to see that if there exists at least one collection 
of points in formation, then this control law works in the 
sense that all pj(£) converge and p ? (£) — p.,(£) —f r.y for 


all (i,j) £ E (considerably stronger statements were proved 
in 0. ED)- For completeness, let us sketch the proof of 
this simple claim now. If p 1 (£),..., p n (t) is any collection 
of points in formation, then defining 

UiO) := Pi{t) -Pi(f), 

we have that Uj(£) follow the update 

Uj(f+ 1) = Ui(f)+ ^2 fa( u j(i) - Ui(*))- (29) 

jeN(i) 

Let p form be the unique stochastic matrix which satisfies 
P/° rm = and let u i>(k) be the vector which stacks up 
the j’th entries of the vectors u^f),..., u n (k). We thus have 

u J (fc + 1) = P form u :, (A:) ) for all j = 1, ..., d, 

and it is now immediate that all u, (f) approach the same vec¬ 
tor. This implies that all p.j(£) approach positions in formation. 

We now turn to the case where the formation control 
update of Eq. ( |28j ) is executed with noise; as we will see, 
under appropriateassumptions the performance of the (noisy) 
formation control protocol can be written as the r) ss of a certain 
matrix. Specifically, we will consider the update 

P»(*+l) = Pi (t) + />i(Pj( t )~P*( t )“ r «) + n »( t X30) 

jeN(i) 

The random vector n*(f) can arise if each node executes the 
motion that updates its position p,;(f) imprecisely. Although 
our methods are capable of handling quite general assumptions 
on the noise vectors n,(f), for simplicity let us assume that 
E[rii(t)] = 0, E[ni(t)n.i(t) T ] = A \l for all i,t, and that 
n*(ti) and rij (f^) are independent whenever t\ £2 or i j. 

Of course, once noise is added convergence to a multiple of 
the formation will not be possible; rather, we will be measuring 
performance by looking at the asymptotic distance to the 
closest collection of points in formation. For an illustration, 
we refer the reader to Figure [T] which shows a single ran of 
Eq. ( [30] ) with four nodes. As can be read off from the figure, 
the nodes will move “towards the formation” when they are far 
away from it, but when they are close the noise terms rq(£) 
effectively preclude the nodes from moving closer and the 
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nodes end up performing random motions in a neighborhood 
of the formation. 

We next formally define the way we will measure the perfor¬ 
mance of the formation control protocol. Let pi, p n (t) 
be a collection of points in formation whose centroid is the 
same as the centroid of pi(f),..., p„(£), i.e., 

-t n i n 

= -Ep‘W- 

2=1 2=1 

It is easy to see that, as long as there exists a single collection 
of points in formation, such pi(f),..., p„(£) always exist, 
and in fact Pi(t),. .., p n (t) is closest collection of points in 
formation to pi(f),..., p n (t). Therefore, we will measure the 
performance of the formation control scheme via the quantity 

n 

Form(G,{/„}) := lim sup - V E [||pi(t) - Pi(f)|| 2 ] • 

t—,too Tl 

In general, obtaining a combinatorial expression for 
Form (G,{fij}) is an open problem. The next proposition 
describes a solution once again under the additional condition 
that the weights {fij} are symmetric, i.e., = ff. 

Proposition 7 (Performance of Formation Control with Sym¬ 
metric Weights as Steady-State Disagreement). Let Q be the 
matrix defined by Qij = A 2 + X'j. If 

• There exists at least one collection of points in formation. 

• The underlying graph G = ( V,, E) is bidirectional and 
connected. 

• The numbers {fij,(i,j) € E} are positive and satisfy 
Ejeiv(i) fij < 1 f or all i and /„ = f jt for all ( i,j ) e E. 

then 

Form(G, {fa}) = d ■ 6 SS ^.P form , I ( n Diag(A?,..., A l) - Q 



Proof. We proceed by changing variables to 

u*(t) = Pi(t) - P i(t). 

Observe that by definition 
1 n 

-V'uz(£)=0. (31) 

n z ' 

2=1 

Naturally, we also have that 

1 n 

Form(G,{/ij}) = lim sup - E E [||u*(t)| |1] . (32) 

t —>oo Tl . 

2=1 

We now observe that the symmetry of the weights {fij} as 
well as the fact that r, ; = — r y , imply that 

1 n i n i n 

-J2pt( t + 1 ) = -J2 Pi w + - Z n i(*)> 

i=i i=t j =1 

which allows us to conclude that for all i = 1, n, 

1 " 

P i(t + 1) = Pi{t) + - V n j(t). 

n 

3 =i 


In turn, this implies that the quantities u*(f) are updated as 


u<(£ +1) = Ui(t) + ^ - u;(£)) + ni(i) 

J6JV (<) 

(33) 

Now for each j = 1,..., d, define \P(t) to stack up the j’th 
components of the vectors Ui (£),..., u„(£). We then have that 
Eq. (|32ji implies 


Form(G, {fij}) = lim sup E ~ E [l|u J (£)|| 2 ] , (34) 


t —VOO . . tl 

J=1 


while Eq. (311 implies that for all t and j = 1, 

-l T u i(f) = 0, 
n 

and finally Eq. ( [33] ) implies 

u j (t + 1) = P form ui(£) + q J (£) 
where the noise vector qi(£) satisfies 

£[q J (*)] = 0 


,d. 


(35) 


(36) 


E K(tWm(t)] = - 


+ 


A? + Ai , eLi a; 

n 2 

\2 \2 
W 2_w=i 


for all k ^ m 


mirm = + 


for all k. 


We may summarize these last three equations as 

E [q J (f)(q J ) T (D] = i (nDiag(A?,...,A^)-<2 


EaMii 


(37) 


Equations ( [36| , (35 i, (34 1 , ( |37[ > now immediately imply the 
proposition. □ 

Summarizing, Proposition [7] characterizes the performance 
of a formation control protocol in terms of the S ss of an 
appropriately defined matrix. We can now apply Theorem[T]to 
obtain a characterization in terms of features of the underlying 
matrix. For simplicity, let us focus on the case when the noise 
covariances are the same at each node, i.e., 

P[rq(f)n,;(f) T ] = A 2 / for all * = 1,..., n. (38) 

In this case, our main result on formation control is as follows. 

Theorem 8 (Long-term Performance of Noisy Formation 
Control with Symmetric Weights). Assuming Eq. p8| ) holds 
as well as all the assumptions of Proposition^ we have that 

K(( p form '| 2 '| 

Form (G,{fij}) = d- A 2 ^- LL 


Proof. Having already established Proposition [7] and Theorem 
|T] all that is left is to combine them. Indeed, if we define 

E form = E l nI _ nT) 

n v ' 










14 


then Proposition [7] for the case of equal-covariances may be 
succintly stated as 

Form(G) = d ■ S ss (P form , £ form ) . 

Since P form is symmetric, we may apply Eq. ( p~6| >. However, 
observe that the right-hand side of Eq. © is linear in T, w , 
and plugging in = ll 7 makes the right-hand side of that 
equation zero. Consequently, 

Form(G) = d ■ 6 SS (P form , A 2 /) . 

We now appeal to Eq. ( [20] ) to complete the proof of this 
proposition. □ 


Thus the long-term performance of formation control is 
proportional to the Kemeny constant of an underlying matrix. 

We next focus on understanding how the performance 
of formation control scales with the underlying graph. Of 
course, there are many possible choices of symmetric {fij} 
for any given undirected graph G. We consider the following 
choice, which is perhaps the simplest: we set all fy where 
(i,j) £ E to some fixed e. In order to satisfy the condition 
that Y^j£N(i) fij < 1 we need to choose e strictly smaller than 
the largest degree; to avoid trouble, we therefore choose 

1 

2 maxi d(i) 


With this choice, Form(G, {fij}) becomes only a function of 
the graph G, so that we will simply write Form(G) henceforth. 

We can now use Theorem [8] to compute the performance 
of the above-described formation control protocol on various 
graphs. This requires the computation of hitting times, and 
this 


since 


something we have done in Section III 


can simply reuse the calculation we have already done (the 
present choice of coefficients fij is only a minor modification). 
We therefore omit an extended discussion and conclude this 
section with the following list. 


• If G is the complete graph, Form(G) ~ d\ 2 . 

• If G is the line graph, Form(G) ~ dnX 2 . 

• If G is the ring graph, Form(G) ~ dnX 2 . 

• If G is the 2D grid, Form(G) = dX 2 0(\og n). 

• If G is complete binary tree, Form(G) = dA 2 0(logn). 

• If G is the 3D grid, Form(G) ~ d\ 2 . 

• If G is the star graph, then Form(G) = 0(dnX 2 ). 

• If G is the two-star graph, then Form(G) = 0(dnX 2 ). 

• If G is a regular a-expander, then Form(G) = 

0{dX 2 /a 2 ). 

• If G is a regular dense graph (recall this means that the 
degree of each node is at least \n/ 2J), then Form(G) ~ 
dX 2 . 

• If G is a regular graph, then Form(G) = 0(dnX 2 ). 


V. Simulations 

We now present some simulations intended to demonstrate 
how some of the scalings we have derived manifest themselves 
in some concrete formation control problems. Indeed, a central 
consequence of our results is that some graphs are better 
than others by orders of magnitude. We note that similar 
observations have been made in the previous literature for a 
number of concrete graphs; a notable reference is Q which 
considered grids with constant spacing and demonstrated a 
dramatic difference between the line graph and the 2D and 
3D grids. 

We focus here on the star graph (where Form(G) = 
0{dnX 2 )) and on the complete binary tree where Form(G) = 
0(dX 2 log n). Figures [ 2 ] and [ 3 ] demonstrate the difference 
between the logarithmic and linear scaling with the number 
of nodes. In Figure [2j we see a single run both protocols 
with seven nodes; the noise here is rather tiny. A 2 = 1/2500, 
whereas all the relative positions have magnitude 1 for the star 
graph and at least one for the binary tree. It might be expected 
that such a small noise would make relatively little difference, 
and indeed both formation seem to do reasonably well. 

We need a quantitative measure of performance in order to 
make the last statement precise, which we define as follows. 
Taking the final positions, p? nal ,.... p^ nal after a given run, 
we define as in Section nVl the positions pf nal ,... ,p® nal 
to be positions in formation with the same centroid as 
p!/ nal ,..., p® nal . We then define 

n 1 

t->_ ^ „ final „ final \ \ ' ^ I I final ;>final I I - 

Form(G, p : ,..., p n ) ■= j I Pi - Pi 11 2 • 

i=1 ' 

The quantity Form(G, p/ nal ,..., pf/ 18,1 ) may be thought as 
measure of performance: it is the per-node squared distance 
to the closest optimal formation. Returning to Figure [2j we 
see that Form(G, pf nal ,..., p^ nal ) is quite small for both 
formations. However, as we scale up to n = 127 in Figure 
[ 3 ] we now see that Form(G, pf nal ,.... p^ naI ) grows much 
faster on the star formation than on the tree formation, which 
results in a dramatic difference in performance. In particular, 
we see that even a tiny noise with A 2 = 1/2500 essentially 
overwhelms the star formation. 


VI. Conclusion 

The main contributions of this paper are three-fold. First, 
we have given an explicit expression for the weighted steady- 
state disagreement in reversible stochastic linear systems in 
terms of stationary distribution and hitting times of appropriate 
Markov chains. Second, we have given the best currently 
known bounds for unweighted steady-state disagreement in 
terms of the same quantities. Finally, we have shown how 
the Kemeny constant characterizes the performance of a class 
of noisy formation control protocols. Additionally, we have 
worked out weighted steady-state disagreement over a number 
of common graphs. 

An open question is whether similar results might be 
obtained without the technical assumption of reversibility. Fur¬ 
thermore, the question of obtaining an exact “combinatorial’ 
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Star Formation, 7 nodes Tree Formation, 7 nodes 



Fig. 2. On the left we show a single run of Eq. (30) on a star formation on seven nodes, while on the right we show the same for the tree formation. Both 
plots show positions from a single run with w(t) = (1/50) X(£) where X(t) are i.i.d. standard Gaussians; each plot shows 22 positions from about 2000 
iterations. Although this is hard to tell with the naked eye, the protocol performs a little better on the star formation here; for the collection of final positions 
Pj nal ,..., p{j nal , we have that Form(G, p^ nal ,..., p^ nal ) ~ 5 • 10 -4 on the star formation, while Form(G, pf nal ,.. ., pjj nal ) ~ 0.001 on the tree 
formation. 
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Tree Formation, 127 nodes 
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Fig. 3. On the left we show a single run of Eq. (30) on a star formation on 127 nodes, while on the right we show the same for the tree formation. Both plots 
show positios from a single run with w(t) = (1/50)A’(t) where X{t) are i.i.d. standard Gaussians; each plot shows 22 positions from about 2000 iterations. 
We note that the superior appearance of the protocol on the tree formation is not merely due to the increased horizontal spread (see axis labels); in fact, we 
have that Form(G, pf nal ,..., pjj nal ) ~ 0.049 on the star formation, while Form(G, pj nal ,..., p^ nal ) ~ 0.0049 (an order of magnitude smaller) on the 
tree formation. 


expression for the quantity <5™ is also open. Finally, it is also 
interesting to wonder how the results we have presented here 
might be extended to time-varying linear systems. 


More broadly, we wonder whether one can find more 
connections between probabilistic or combinatorial quantities 
and the behavior of linear systems. Indeed, we would argue 
that the past decade of research of distributed control has high¬ 
lighted the importance of studying linear systems on graphs. 
Relating classical quantities of interest in control theory, such 
as stability and noise robustness, to the combinatorial features 
of the graphs underlying the system could have a significant 
repercussions in the control of multi-agent systems. 
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